Chi-squared statistic for checking that a dice is fair

Author: Leonardo Espin

Date: 5/07/2019

  • For simulating a fair dice, we sample a uniform random number $x$ and the side $i$ corresponds with the a segment $(i-1)/n \leq x < i/n$

So we need to find the largest $i$ s.t. $j=i-1 \leq nx$ which is int(n*x)

In [1]:
import numpy.random  as random
def fairDice(n=6):
    x=random.uniform()
    return int(n*x)+1
In [2]:
samples=[fairDice() for i in range(10000)] 
In [4]:
#the dice seems fair when we plot the distribution
import seaborn as sns
#for suppressing seaborn warnings
import warnings
warnings.filterwarnings("ignore")

sns.distplot(samples,bins=6,kde=False)
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f6598f87ba8>
In [5]:
#chi-squared test for the dice:
from scipy.stats import chi2
from scipy.stats import chisquare #just for computing the statistic
N=len(samples)
counts={i:samples.count(i) for i in range(1,7)}
chis=[N*6*(x/N-1/6)**2 for x in counts.values()]
error=sum(chis)#the observed error

print(error)
#the line below returns the observed error and the survival function
#value (1-cumulative dist.) for the error. which are not very useful
#but I use it to check that the statistic is properly calculated
print(chisquare(list(counts.values())))

#the critical value for a 95 percentile for 5 degrees of freedom chi^2
critical=chi2.ppf(.95,5)
print('\nThe chi-square critical value for a 95% significance level\nwith 5 degrees of freedom is {:.2f}'.format(critical))
print('\nsince the observed error {:.2f} is less than the critical value\n{:.2f} we can\'t reject the hyp. that the dice is fair'.format(error,critical))
4.301599999999993
Power_divergenceResult(statistic=4.3016, pvalue=0.5068589197634881)

The chi-square critical value for a 95% significance level
with 5 degrees of freedom is 11.07

since the observed error 4.30 is less than the critical value
11.07 we can't reject the hyp. that the dice is fair

For simulating a loaded dice we can recycle the code for a fair dice above.

  • First we create a prob. distribution for the loaded dice, for example we could do:
1 2 3 4 5 6
4/12 1/12 1/12 1/12 4/12 1/12
  • We then sample from a fair dice with 12 sides and create a map with the corresponding sides in the 6-side loaded dice:
In [6]:
twelve2six={1:1,2:1,3:1,4:1,5:2,6:3,7:4,8:5,9:5,10:5,11:5,12:6}
loadedSamples=[fairDice(12) for i in range(10000)] 
loadedSamples=list(map(lambda x:twelve2six[x],loadedSamples))
In [7]:
#this dice is obvously not fair:
sns.distplot(loadedSamples,bins=6,kde=False)
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f6598ee4518>
In [8]:
#chi-squared test for the dice:
N=len(loadedSamples)
lcounts={i:loadedSamples.count(i) for i in range(1,7)}
lchis=N*[6*(x/N-1/6)**2 for x in lcounts.values()]
lerror=sum(lchis)
supercritical=chi2.ppf(.999,5)
print('\nThe chi-square critical value for a 99.9% significance level\nwith 5 degrees of freedom is {:.2f}'.format(supercritical))
print('\nsince the observed error {:.2f} is more than the critical value\n{:.2f} we reject the hyp. that the dice is fair with 99.9% confidence'.format(lerror,supercritical))
The chi-square critical value for a 99.9% significance level
with 5 degrees of freedom is 20.52

since the observed error 5165.93 is more than the critical value
20.52 we reject the hyp. that the dice is fair with 99.9% confidence